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1 TUTORIAL GOALS 


The goal of this tutorial is to present the Empirical Bayes (EB) method for analyzing before-after 
crash data in a step-by-step format. The tutorial is designed to be used in conjunction with the 


companion Empirical Bayes Excel Spreadsheet. 
2 EMPIRICAL BAYES METHOD 


The Empirical Bayes (EB) process consists of five steps: determining (1) the safety performance 
function, SPF, (2) the overdispersion parameter, @, (3) the relative weights, a, (4) the estimated 
expected crashes, 7 and (5) the index of effectiveness, 8. Each of these steps is described more 


fully below. 


2.1 Determination of the Safety Performance Function, SPF 


The first step in the Empirical Bayes process is to determine a unique Safety Performance 
Function (SPF). The SPF is a mathematical model that predicts an estimate of crash occurrence 
for a given roadway segment (1). According to Hauer, crash occurrence is best modeled using a 
multivariate statistical model (2). A model is simply an equation or set of equations that link the 
expected crash frequency on the roadway to measurable roadway traits such as AADT, length of 


roadway segment, roadway width, shoulder width, number of lanes, etc. 


The SPF is determined from the data collected in the period before any treatments were made to 
the roadway segment and therefore can consider data available from previously identified “case” 
and “control” sites to increase the size of the sample and enhance the accuracy of the predictive 
model. The SPF can then be used to predict the number of crashes expected to occur each year 
at the “case” sites had there been no improvements to the roadway. Each type of roadway, 
Interstate and non-Interstate, will have different SPFs to predict the expected number of crashes. 
The multivariate statistical model used to establish the SPF can be determined using various 
statistical modeling computer software packages on the market today. LIMDEP Version 7.0 was 
used in this investigation to determine the SPFs for the two types of roadways in question. 
Based on the crash and roadway parameter data available, the SPF was modeled using a multiple 


linear regression equation that estimates the number of crashes per three years per roadway 


segment that occurred in the “before” period. The multiple linear regression equation is of the 
following form: 

SPF, = By + BX + BoXjz + By 1X p-1 + €; 
where: 

SPF; denotes the dependant variable (crashes per three years for roadway segment i 


before any treatment) 


xj, through xXip-1 denote the independent, explanatory variables (average annual daily 
traffic (AADT), number of lanes, lane width, shoulder width, speed limit, etc.) for 


roadway segment i 


Bo through B,.; denote estimable parameters (determined by LIMDEP), fo represents the 


y-intercept value and 
¢; denotes the unexplainable, random error not accounted for in the model. 


Using the crash data provided as part of this study, the unique estimated SPFs for Interstate and 
non-Interstate roadways are described below and detailed in Tables 1 and 2, respectively. In 
each case, average annual daily traffic (AADT) and segment length were found to be significant 
in affecting crash occurrence at the 95-percent confidence level (t-statistics => 1.96 confirm 
significance): 

SPF, = 8, + B,(L,) + B,(AADT,) 
where: 


L; denotes the length of roadway segment i 


AADT; denotes the annual average daily traffic per three years on roadway segment i 


and all other variables are as previously defined. 


Variables such as the number of lanes, lane width, shoulder width, speed limit and others were 
largely invariable across the sample and hence, did not prove to be significant explanatory 
variables for crash occurrence. For example, lane widths were consistently 12 feet for roadway 
segments experiencing both high and low crash occurrences precluding any meaningful 


correlation. 


Further, key variables specific to reconstruction and pavement preservation treatments (as 
compared to direct safety treatments) and their resulting effects on the roadway environment 
were omitted from the available data set. For example, data describing roadway surface 
condition (i.e., ride quality, surface friction, degree of rutting or cracking, etc.) prior to and 
following treatment may have proven significant in predicting the change in crash occurrence for 


these types of treatments. 


Table 1. Model Parameters for Interstate Highways 


Variable Bo, Bi, and By | t-statistic | Standard Error 
y-intercept 1.812309 3.584 0.50568 
Li 0.108752 3.435 0.03166 
AADT; 0.000167 5.189 0.00003 

o 0.078141 2.266 0.03448 


Table 2. Model Parameters for Non-Interstate Highways 


Variable Bo, Bi, and By | t-statistic | Standard Error 
y-intercept 1.207848 4.302 0.28075 
Lj 0.063425 3.506 0.01809 
AADT; 0.000560 4.861 0.00012 

o 0.182151 2.54 0.07171 


With only two significant variables and a small number of roadway segments in the sample, the 
goodness of fit for either model is low; the adjusted p*-value for Interstate and non-Interstate 
roadway segments are 0.240 and 0.201, respectively. A p-value equal to 1.0 indicates a perfect 
model. Also included in Tables 1 and 2 are the estimated overdispersion parameters, o, for each 


roadway type which are discussed in detail in the next section. 


2.2. Determination of the Overdispersion Parameter, @ 


To estimate a roadway’s SPF, it is necessary to assume an underlying probability distribution for 
the crash frequencies. Historically, crash frequencies were often assumed to follow a Poisson 


distribution. The Poisson distribution assumes that the mean and variance observed for the crash 


frequency variable are equal. Studies have shown that the differences between the crash 
frequencies and model predictions based on a Poisson distribution are inconsistent, likely 
resulting from a violation of this equality assumption (3). Therefore, researchers more 
commonly assume a negative binomial distribution to represent the distribution of crash 
frequencies (3). One of the parameters used to confirm whether the underlying probability 
distribution is correctly identified as negative binomial is the overdispersion parameter, >. Data 
is said to be overdispersed if the variance of the dependent variable exceeds its mean (i.e., 
violating the constraints of the Poisson distribution). For both Interstate and non-Interstate 
roadways considered in this investigation, the data was confirmed to be overdispersed (the 
variance of the crash frequency variable exceeded the mean) as evidenced by a statistically 


significant d—value at the 95-percent confidence level. 


Proceeding with the EB method, this overall overdispersion parameter, representing all roadway 
segments in combination, is secondly used to account for varying degrees of overdispersion 
between roadway segments attributable to differences in roadway traits and crash occurrences. If 
each roadway segment were of equal length and had consistent geometric characteristics, traffic 
characteristics, etc., the overall overdispersion parameter would be directly applicable to each 
individual roadway segment. However, since the roadway segments vary in length and 
characteristics, a unique overdispersion parameter, ;, must be determined for each roadway 
segment. Segment length is assumed to be a primary determinant affecting individual 
overdispersion parameter values. Under this assumption, using the overall overdispersion 
parameter as the overdispersion of each individual segment would skew the model by placing 
more emphasis on the longer roadway segments (3). To better estimate the expected number of 
crashes for each individual roadway segment, the overdispersion parameter can be adjusted 


based on length to represent the individual segment, i: 
—; =¢- ie 
where: 


; denotes the adjusted overdispersion parameter for roadway segment i 
denotes the overall overdispersion parameter for all combined roadway segments 


L; denotes the length of roadway segment i 


B is a constant between 0 and 1 (3). 


The P-value takes into account the differences in geometric characteristics, traffic 
characteristics, etc. between the individual roadway segments; if each roadway segment Was 
completely dissimilar from other roadway segments (i.e., had no characteristic similarities to the 
other roadway segments), B = 0 and the roadway segment in question would be represented by 
the overall overdispersion parameter. Alternately, if each of the roadway segments had exactly 
the same characteristics as all other segments of the roadway, B = 1 and the overdispersion for 
the roadway segment in question would be represented by the overall overdispersion parameter 
adjusted only by the segment length. A f—value somewhere between zero and one is most 
representative for segments defined along a continuous roadway (3). However, a B—value equal 
to 1 was assumed for this study to provide the most conservative estimates of future crash 


occurrences. 


An alternative method for determining the adjusted overdispersion parameter assumes a unique 
gamma distribution for each roadway segment, i (3). The individual overdispersion parameters 
using this alternate method can be calculated as follows: 

¢, =: SPF a 


where y is a constant between 0 and 1 and all other variables are as previously defined(3). 


If the parameter y is set to zero, then the standard negative binomial model is obtained. If y is 
greater than zero, then the variance of the gamma distribution decreases as SPFi increases (3). 
This method of analysis has been used by a number of researchers to determine the individual 
overdispersion parameter for before-after studies and likely yields more accurate results, 
however, determination of the y-value requires analysis not typically employed in practice. 
Miaou and Lum modeled crash occurrence on rural interstate highways using y = 1 (4). 
Previously published y-values for similar roadway types may be transferable. Hence, this 


investigation also assumed y = 1. 


With no superior method for determining individual overdispersion parameters emerging, this 
investigation used both methods and carried the two sets of results forward throughout the 


remainder of the EB process. Tables 3, 4 and 5 and summarize the results. 


Table 3. Overdispersion Parameters and Relative Weights for Interstate Roadway Segments 


; = OL; ; = OSPF; 
PROJECT 
LD. PROJECT NAME OverdispersionRelativeOverdispersion Relative 
Parameter |/Weight| Parameter | Weight 
I-15-2-(70)116/BUXTON INTERCHANGE. -N &S 0.38 0.114 0.23 0.072 
I-15-4(75)200 [LINCOLN ROAD - SIEBEN 1.25 0.233 0.32 0.072 
I-15-6(28)323 BRADY - NO. & SO. (NORTHBOUND) 0.92 0.202 0.28 0.072 
1-90-1(119)74 |ALBERTON - EAST & WEST 0.83 0.167 0.32 0.072 
1-90-5(53)240 |PIPESTONE EAST & WEST 0.66 0.147 0.30 0.072 
1-90-7(70)341 |MISSION INTERCHANGE - EAST 0.88 0.172 0.33 0.072 
1-90-9(81)503 [DUNMORE - SOUTH 0.42 0.113 0.26 0.072 
1-90-8(131)450/27TH ST. - LOCKWOOD 0.28 0.050 0.41 0.072 
1-94-3(50)115 [5.3 KM WEST OF HATHAWAY - EAST) 1.04 0.215 0.30 0.072 
1-94-4(56)129 [MILES CITY - EAST & WEST 0.99 0.205 0.30 0.072 
1-94-4(57)143 |BAKER INTERCHANGE - EAST 0.39 0.119 0.23 0.072 
1-94-5(27)163 [PRAIRIE COUNTY LINE - EAST 0.50 0.143 0.23 0.072 
1-94-6(45)191 |DAWSON COUNTY LINE - EAST 1.48 0.252 0.34 0.072 


Table 4. Overdispersion Parameters and Relative Weights for Highway Reconstruction Roadway 
Segments 


o; = OL; o; = OSPF; 

PROJECT I.D. PROJECT NAME OverdispersionRelativeOverdispersionRelative 

Parameter |Weight| Parameter /Weight 
STPP13-1(22)0 [REYNOLDS PASS - NORTH 1.57 0.355 0.52 0.154 
STPP14-2(12)33/WHITE SULPHUR SPRINGS - SOUTH 1.63 0.357 0.54 0.154 
NH16-1(35)23_ [YELLOWSTONE CO. LINE (N. - S.) V6 0.286 0.53 0.154 
STPP52-2(20)40|ICRESTON NORTH 1.41 0.273 0.68 0.154 
INH1-1(37)69 |HAPPY'S INNE & W 2.11 0.391 0.60 0.154 
STPP13-1(19)65|NORRIS - HARRISON 1.73 0.366 0.55 0.154 
NH53-1(18)16 {ACTON - NORTHWEST 2.09 0.384 0.61 0.154 


Table 5. Overdispersion Parameters and Relative Weights for Highway Preservation Roadway 
Segments 


6; = OL; 6; = OSPF; 

PROJECT LD. PROJECT NAME Overdispersion|Relative|\Overdispersion\Relative 

Parameter |Weight| Parameter | Weight 
INH1-8(20)72 MALTA - SACO 4.99 0.576 0.67 0.154 
STPN5-2(81)79 [ELMO - NORTH 2.48 0.385 0.72 0.154 
INH11-1(30)14 |/YANKEE JIM CANYON - NORTH 1.93 0.405 0.52 0.154 
INH11-1(31)24 |EMIGRANT NORTH - SOUTH 1.81 0.379 0.54 0.154 
STPP13-1(27)24 |McATEE (NORTH -SOUTH) 2.12 0.445 0.48 0.154 
STPN24-1(48)32|CLEARWATER JCT. - EAST 4.35 0.529 0.71 0.154 
INH37-2(19)62  |ASHLAND - EAST 2.59 0.500 0.47 0.154 


2.3 Determination of the Relative Weight, a 


To adjust for varying degrees of overdispersion, a relative weight, a, , is applied to each roadway 
segment. The segment-specific relative weight is determined as follows: 
1 
a, =——__ 
1+ SPF, /¢, 
where a, denotes the relative weight applied to roadway segment i and all other variables are as 
previously defined (1). The roadway segment relative weights for this investigation are provided 


in Tables 3 through 5. 


2.4 Determination of Estimated Expected Crashes, 


Once the previous steps have been completed, the estimate of the expected crashes for a given 
roadway segment can be calculated using the following equation (1): 

m= (&)-(SPF/) + (1-a) (Ai) 
where: 


m denotes the expected number of crashes per three years on roadway segment, i 


A; denotes the actual number of crashes per three years on roadway segment, i 


and all other variables are as previously defined. 
2.5 Determination of the Index of Effectiveness, 0 


The last step in the EB process is to express the resulting effectiveness of any treatment (i.e., 
roadway reconstruction and pavement preservation improvements, safety improvements, etc.) as 
a relative difference in crash occurrence between actual and expected. With the expected crash 
occurrence determined in the previous step and the actual crash occurrence observed, the 
difference can be calculated directly. However, this direct calculation method does not account 
for the uncertainty resulting from (1) sampling such a small number of projects to represent the 
larger population, (2) the resulting low explanatory power (i.e., goodness of fit) of the SPF, (3) 
the assumptions supporting the determination of the overdispersion parameters and relative 


weights and (4) the overall underlying data variability project to project. Instead, an index of 


effectiveness, 0,, that takes into account this uncertainty through the data variance observed for 
each roadway segment must be determined. The variance, o,’, can be calculated as follows: 
Go; =(1-4,) x; 


where a, and 7; are as previously defined. 


The variance of the data can also be calculated using the following equation: 
o; = SPF, hs | 


i i 


where all variables are as previously defined. 


The index of effectiveness is a function of the previous parameters given by (8): 
0. a A, |X, 


i 2 2 
1+ \o; |a, 


where 0, denotes the index of effectiveness and all other variables are as previously defined. 


Finally, the relative difference in crash occurrence between actual and expected conditions is 
determined as (5): 
relative difference in crash occurrence = 100(1-0,) 


where all variables are as previously defined and results are expressed as a percentage. 
3 USING THE EB EXCEL SPREADSHEET 


An Excel spreadsheet is provided to facilitate application of the Empirical Bayes process, 
previously outlined, for: (1) Interstate, (2) two-lane highway preservation and (3) two-lane 
highway restoration projects. The SPF and overall overdispersion parameter have already been 


established for each roadway type as part of this larger study. 


The first component of the spreadsheet provides an overview of the EB process as shown in 
Figure 1. The second component of the spreadsheet consists of two tables that determine the 
effectiveness of roadway treatments (based on the two alternate overdispersion parameter, ji, 
calculations) using the process outlined in Section 2. The user is required to input values into the 


yellow highlighted cells in the worksheet shown in Figure 2. Once the information is included in 


the spreadsheet, the percent relative difference between actual and expected crashes will 


automatically be determined. The information required for the spreadsheet includes: 


Project ID 
e Project Name 


e Milepost at the beginning of the project segment (MP Begin) and the Milepost at the end 
of the project segment (MP End) 


e AADT on the segment 


e Number of actual crashes on the segment 


fa] =l5)x!} 
Type a question For help i 


Type a question For help ii 


2 Microsoft Excel 


Desay Sl 


NBS Relative Weight, a 


“i Tutorial2 


The Safety Performance Function for interstate highways was determined by statistical analysis to be: 


SPF = 1.81231 + 0.10875(L) + 0.00016707(AADT) 
e= 0.07814 i ft 


Where: | | 
SPF = number of crashes per three years per roadway segment 
__L= length of roadway segment in miles _ | 
AADT = average annual daily traffic over a three year period 
= overall overdispersion parameter | 
| 


EB Analysis Process: 


1. Determine the SPF results for roadway segment, /, using the equation above. 


2. Determine the overdispersion parameter, ¢, for roadway segment, f: 


‘4. Determine the expected number of crashes for roadway segment, i: 


A= (O)(SPR) + (1-a)(4) 


I i i | 
S._ Determine the index of effectiveness for'roadway segment, #:_| 


va | 
(77) | 
| | | 
7. Determine the relative difference in crash occurrence for roadway segment, i, and overall: 
T T 


difference = 100(1- 8) 


Figure 1. Overview of the EB Method 
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Interstate Resurfacing EB Results: 


PROJECT ID. PROJECT NAME 


Lengil 
4.800 


25 
36 0.92 


38 0.99) 
4A 1.48 0.252 


Interstate Resurfacing EB Results: 
di = O(SPF) 


Expected 
Accidents, x 


3 I 14.2 
5.3 0.41 0.072 127.5 
3.8 0.30 0.072 34.6 


Figure 2. User Inputs 


4 INTERPRETING THE RESULTS 


For a safety improvement to be noted, the number of crashes expected to occur on a roadway 
segment had no treatment been made should exceed the actual number of crashes observed on 
the roadway segment following treatment. The last two columns in the spreadsheet indicate if 
the actual number of crashes that occurred on each roadway segment is higher or lower than the 
number of crashes expected to occur without treatment. If the actual number of crashes 
occurring was lower, the results will be tabulated in the % Lower than Expected column and the 
text NA (not applicable) will appear in the % Higher than Expected column. The reverse effect 


will occur if the actual number of observed crashes exceeds the expected crash frequency. The 
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results for all roadway reconstruction projects in combination are highlighted in green in the 
spreadsheet and are located at the bottom of the % Lower than Expected and the % Higher than 


Expected columns (see Figure 3). 
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Interstate Resurfacing EB Results: 


PROJECT ID. PROJECT NAME 


Interstate Resurfacing EB Results: 
di = (SPF) 


PROJECT ID. PROJECT NAME 


Interstate Worksheet 


Figure 3. Relative Difference Display 


5 LIMITATIONS OF THE EB METHOD SPREADSHEET 


The primary limitation of the EB method as applied in this spreadsheet is that the SPF was 
estimated using an aggregate three years of crash data (i.e., crash frequencies per three years per 
roadway segment). Hence, to accurately apply this SPF model, the units of crash frequencies per 
three years per roadway segment need to be maintained (i.e., annual crash data cannot be used in 


place of three-year aggregated data without re-estimating the SPF model). 


11 


A more minor limitation pertaining to the use of the spreadsheet occurs when new rows of 
information are added into Excel. All of the calculations for individual roadway segments will 
remain unaltered, however the final percent relative difference in crashes by roadway type could 
be miscalculated if the user does not modify the cell formula to include the additional rows. The 
user should double-check that the summations of the Actual Crashes, the Expected Crashes, and 


the Variance columns are all summed correctly, so that the final results are accurate. 
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